21 research outputs found
Learning to sample from noise with deep generative models
L’apprentissage automatique et spécialement l’apprentissage profond se sont imposés ces
dernières années pour résoudre une large variété de tâches. Une des applications les plus
remarquables concerne la vision par ordinateur. Les systèmes de détection ou de classification ont connu des avancées majeurs grâce a l’apprentissage profond. Cependant, il reste de
nombreux obstacles à une compréhension du monde similaire aux être vivants. Ces derniers
n’ont pas besoin de labels pour classifier, pour extraire des caractéristiques du monde réel.
L’apprentissage non supervisé est un des axes de recherche qui se concentre sur la résolution
de ce problème.
Dans ce mémoire, je présente un nouveau moyen d’entrainer des réseaux de neurones de
manière non supervisée. Je présente une méthode permettant d’échantillonner de manière
itérative a partir de bruit afin de générer des données qui se rapprochent des données
d’entrainement. Cette procédure itérative s’appelle l’entrainement par infusion qui est une
nouvelle approche permettant d’apprendre l’opérateur de transition d’une chaine de Markov.
Dans le premier chapitre, j’introduis des bases concernant l’apprentissage automatique et la
théorie des probabilités. Dans le second chapitre, j’expose les modèles génératifs qui ont
inspiré ce travail. Dans le troisième et dernier chapitre, je présente comment améliorer
l’échantillonnage dans les modèles génératifs avec l’entrainement par infusion.Machine learning and specifically deep learning has made significant breakthroughs in recent
years concerning different tasks. One well known application of deep learning is computer vision. Tasks such as detection or classification are nearly considered solved by the community.
However, training state-of-the-art models for such tasks requires to have labels associated
to the data we want to classify. A more general goal is, similarly to animal brains, to be
able to design algorithms that can extract meaningful features from data that aren’t labeled.
Unsupervised learning is one of the axes that try to solve this problem.
In this thesis, I present a new way to train a neural network as a generative model capable of
generating quality samples (a task akin to imagining). I explain how by starting from noise,
it is possible to get samples which are close to the training data. This iterative procedure
is called Infusion training and is a novel approach to learning the transition operator of a
generative Markov chain.
In the first chapter, I present some background about machine learning and probabilistic
models. The second chapter presents generative models that inspired this work. The third
and last chapter presents and investigates our novel approach to learn a generative model
with Infusion training
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations
Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and
reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading
paradigms for self-supervised learning of vision transformers, but they differ
substantially in their transfer performance. Here, we aim to explain these
differences by analyzing the impact of these objectives on the structure and
transferability of the learned representations. Our analysis reveals that
reconstruction-based learning features are significantly dissimilar to
joint-embedding based learning features and that models trained with similar
objectives learn similar features even across architectures. These differences
arise early in the network and are primarily driven by attention and
normalization layers. We find that joint-embedding features yield better linear
probe transfer for classification because the different objectives drive
different distributions of information and invariances in the learned
representation. These differences explain opposite trends in transfer
performance for downstream tasks that require spatial specificity in features.
Finally, we address how fine-tuning changes reconstructive representations to
enable better transfer, showing that fine-tuning re-organizes the information
to be more similar to pre-trained joint embedding models
A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation
Self-Supervised Learning (SSL) models rely on a pretext task to learn
representations. Because this pretext task differs from the downstream tasks
used to evaluate the performance of these models, there is an inherent
misalignment or pretraining bias. A commonly used trick in SSL, shown to make
deep networks more robust to such bias, is the addition of a small projector
(usually a 2 or 3 layer multi-layer perceptron) on top of a backbone network
during training. In contrast to previous work that studied the impact of the
projector architecture, we here focus on a simpler, yet overlooked lever to
control the information in the backbone representation. We show that merely
changing its dimensionality -- by changing only the size of the backbone's very
last block -- is a remarkably effective technique to mitigate the pretraining
bias. It significantly improves downstream transfer performance for both
Self-Supervised and Supervised pretrained models
PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Synthetic image datasets offer unmatched advantages for designing and
evaluating deep neural networks: they make it possible to (i) render as many
data samples as needed, (ii) precisely control each scene and yield granular
ground truth labels (and captions), (iii) precisely control distribution shifts
between training and testing to isolate variables of interest for sound
experimentation. Despite such promise, the use of synthetic image data is still
limited -- and often played down -- mainly due to their lack of realism. Most
works therefore rely on datasets of real images, which have often been scraped
from public images on the internet, and may have issues with regards to
privacy, bias, and copyright, while offering little control over how objects
precisely appear. In this work, we present a path to democratize the use of
photorealistic synthetic data: we develop a new generation of interactive
environments for representation learning research, that offer both
controllability and realism. We use the Unreal Engine, a powerful game engine
well known in the entertainment industry, to produce PUG (Photorealistic Unreal
Graphics) environments and datasets for representation learning. In this paper,
we demonstrate the potential of PUG to enable more rigorous evaluations of
vision models
On the use of a pulsed-laser source in laboratory seismic experiments
International audienceReproduction of large-scale seismic exploration at laboratory-scale with controllable sources is a promising approach that could not only be applied to study small-scale physical properties of the medium, but also contribute to significant progress in wave-propagation understanding and complex media imaging at exploration scale via upscaling methods. We seek to characterize the properties of a laser-generated seismic source for new geophysical experiments at laboratory scale. This consists in generating seismic waves by pulsed-laser impacts and measuring the displacement wavefield by laser vibrometry. Parallel 2D/3D simulations using Discontinuous Galerkin discretization method and analytic predictions have been done to match the experimental data
A Cookbook of Self-Supervised Learning
Self-supervised learning, dubbed the dark matter of intelligence, is a
promising path to advance machine learning. Yet, much like cooking, training
SSL methods is a delicate art with a high barrier to entry. While many
components are familiar, successfully training a SSL method involves a dizzying
set of choices from the pretext tasks to training hyper-parameters. Our goal is
to lower the barrier to entry into SSL research by laying the foundations and
latest SSL recipes in the style of a cookbook. We hope to empower the curious
researcher to navigate the terrain of methods, understand the role of the
various knobs, and gain the know-how required to explore how delicious SSL can
be
Example-based wrinkle synthesis for clothing animation
This paper describes a method for animating the appearance of clothing, such as pants or a shirt, that fits closely to a figure's body. Compared to flowing cloth, such as loose dresses or capes, these types of garments involve nearly continuous collision contact and small wrinkles, that can be troublesome for traditional cloth simulation methods. Based on the observation that the wrinkles in closefitting clothing behave in a predominantly kinematic fashion, we have developed an example-based wrinkle synthesis technique. Our method drives wrinkle generation from the pose of the figure's kinematic skeleton. This approach allows high quality clothing wrinkles to be combined with a coarse cloth simulation that computes the global and dynamic aspects of the clothing motion. While the combined results do not exactly match a high-resolution reference simulation, they do capture many of the characteristic fine-scale features and wrinkles. Further, the combined system runs at interactive rates, making it suitable for applications where high-resolution offline simulations would not be a viable option. The wrinkle synthesis method uses a precomputed database built by simulating the high-resolution clothing as the articulated figure is moved over a range of poses. In principle, the space of poses is exponential in the total number of degrees of freedom; however clothing wrinkles are primarily affected by the nearest joints, allowing each joint to be processed independently. During synthesis, mesh interpolation is used to consider the influence of multiple joints, and combined with a coarse simulation to produce the final results at interactive rates
Asymmetric jet correlations in p p^\uparrow scattering
We propose that back-to-back correlations in azimuthal angle of jets produced
in collisions of unpolarized with transversely polarized proton beams could be
used to determine Sivers functions. The corresponding single-spin asymmetry is
not power-suppressed, but is subject to Sudakov suppression. We present
estimates of the asymmetry (without and with Sudakov effects) for RHIC at jet
transverse momenta of ~10 GeV and show that it may reach a few per cent or more
and could provide access to the gluon Sivers function.Comment: 14 pages, LaTeX, 3 figures as epsi files. With minor additions to
first version. Accepted for publication in Phys. Rev.